Take-Home Exercise 1

Published

December 2, 2023

Modified

December 16, 2023

Setting the Scene

As urban systems digitize, encompassing buses, taxis, transit, utilities, and roads, the resulting datasets offer a comprehensive framework for tracing movement over both space and time. This surge in pervasive technologies like GPS and RFID integrated into vehicles amplifies this capability. For instance, smart cards and GPS devices in public buses collect extensive data on routes and ridership, potentially unveiling intricate patterns that divulge vital insights into measured phenomena. Analyzing and comparing these patterns could yield profound understanding of human behavior and movement within cities. These insights hold promise for enhancing urban management and providing crucial information to both public and private urban transport services for informed decision-making and competitive edge.

Despite the potential, the real-world application of this abundant location-aware data tends to be limited to basic tracking and mapping using GIS applications. This limitation stems from conventional GIS tools lacking the necessary functionality to effectively analyze and model spatial and spatio-temporal data.

Objectives

Exploratory Spatial Data Analysis (ESDA) hold tremendous potential to address complex problems facing society. Applying Geovisualisation and Analysis and Local Indicators of Spatial Association (GLISA) to undercover the spatial and spatio-temporal mobility patterns of public bus passengers in Singapore.

Getting started

Loading all the package that could be use for this into R.

pacman::p_load(tmap, sf, sfdep, dplyr, mapview, ggpubr, tidyverse)

Geospatial Data

Import the BusStop data, this data included the information of all the bus stops located in Singapore. This data is provided by LTA DATA MALL.

BusStop <- st_read(dsn = "data/geospatial",
                   layer = "BusStop") %>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `/Users/pirapatchaiya/Library/Mobile Documents/com~apple~CloudDocs/SMU MITB NOV TERM/Applied Geospatial Analytics/ISSS624/ISSS624/Take-Home_Ex/Take-Home_Ex1/data/geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
tmap_mode("view")

Plotting the raw data, this plot show all the BusStops location

tm_shape(BusStop) + 
  tm_dots()

Hexagonal Grid Creation

The code chunk below uses the sf package to create a grid of polygons around bus stops with a specific cell size and then converts this grid into an sf object while assigning unique grid IDs to each cell.

grid = st_make_grid(BusStop, c(500), what = "polygons", square = FALSE)
# sf and add grid ID
grid_sf = st_sf(grid) %>%
  # add grid ID
  mutate(grid_id = 1:length(lengths(grid)))

This code chunk below determines how many bus stops intersect with each grid cell. Then, it filters out grid cells that don’t contain any bus stops, keeping only those cells with at least one bus stop inside.

grid_sf$n_colli = lengths(st_intersects(grid_sf, BusStop))

# remove grid without value of 0 (i.e. no points in side that grid)
grid_count = filter(grid_sf,n_colli > 0 )
tmap_mode("view")

Plotting hexagon map

tm_shape(grid_count) +
  tm_fill(
    col = "n_colli",  
    palette = "Blues",
    style = "cont",
    title = "Number of collisions",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6,
    popup.vars = c(
      "Number of collisions: " = "n_colli"
    ),
    popup.format = list(
      n_colli = list(format = "f", digits = 0)
    )
  ) +
  tm_borders(col = "grey40", lwd = 0.7)

Next, as we can see there are some busstop that is outside Singapore. In this case it will be exclude.

Removing the Bus Stops outside Singapore using the code below

grid_count_rm <- grid_count %>%
  filter(!grid_id == 1767,
         !grid_id == 2073,
         !grid_id == 2135,
         !grid_id == 2104)

New Plot after removed the Bus Stops outside Singapore

tm_shape(grid_count_rm) +
  tm_fill(
    col = "n_colli",  
    palette = "Blues",
    style = "cont",
    title = "Number of collisions",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6,
    popup.vars = c(
      "Number of collisions: " = "n_colli"
    ),
    popup.format = list(
      n_colli = list(format = "f", digits = 0)
    )
  ) +
  tm_borders(col = "grey40", lwd = 0.7)

Aspatial Data

Import the aspatial data into R, the data that will be use here is the “origin_destination_bus_202310”. This data is originally from the LTA DATA MALL.

busTrips <- read_csv("data/aspatial/origin_destination_bus_202310.csv")
# busTrips <- read_csv("data/aspatial/origin_destination_bus_202308.csv")
# busTrips <- read_csv("data/aspatial/origin_destination_bus_202309.csv")

busTrips$ORIGIN_PT_CODE <- as.factor(busTrips$ORIGIN_PT_CODE)
busTrips$DESTINATION_PT_CODE <- as.factor(busTrips$DESTINATION_PT_CODE)

Data Wrangling

Aspatial Data Wrangling

Calculating bus trip in weekday and morning peak hour

busTripsDayMorning <- busTrips %>%
  filter(DAY_TYPE == "WEEKDAY", 
         TIME_PER_HOUR >= 6, 
         TIME_PER_HOUR <= 9) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(WeekdayMorningTrips = sum(TOTAL_TRIPS))

Calculating bus trip in weekday and afternoon peak hour

busTripsDayAfternoon <- busTrips %>%
  filter(DAY_TYPE == "WEEKDAY", 
         TIME_PER_HOUR >= 17, 
         TIME_PER_HOUR <= 20) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(WeekdayAfternoonTrips = sum(TOTAL_TRIPS))

Calculating bus trip in weekend and morning peak hour

busTripsEndMorning <- busTrips %>%
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY", 
         TIME_PER_HOUR >= 11, 
         TIME_PER_HOUR <= 14) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(WeekendMorningTrips = sum(TOTAL_TRIPS))

Calculating bus trip in weekend and evening peak hour

busTripsEndEvening <- busTrips %>%
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY", 
         TIME_PER_HOUR >= 16, 
         TIME_PER_HOUR <= 19) %>%
  group_by(ORIGIN_PT_CODE) %>%
  summarise(WeekendEveningTrips = sum(TOTAL_TRIPS))

Joining all the Peak Trips

BusTrips_comb <- busTripsDayMorning %>%
  left_join(busTripsDayAfternoon) %>%
  left_join(busTripsEndMorning) %>%
  left_join(busTripsEndEvening)

Geospatial Data Wrangling

grid_bus <- st_join(grid_count_rm,BusStop,join = st_contains) %>%
  unnest() %>%
  select(grid_id,BUS_STOP_N)
grid_bus$BUS_STOP_N <- as.factor(grid_bus$BUS_STOP_N)

Joining aspatial and geospatial

Joining both aspatial and geospatail data together

Trips <- left_join(BusTrips_comb,grid_bus,
                   by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  group_by(grid_id)%>%
  summarise(WeekdayMorningTrips = sum(WeekdayMorningTrips),
            WeekdayAfternoonTrips = sum(WeekdayAfternoonTrips),
            WeekendMorningTrips = sum(WeekendMorningTrips),
            WeekendEveningTrips = sum(WeekendEveningTrips))
Trips <- left_join(grid_count_rm,Trips) %>%
  mutate (Total_Trips = WeekdayMorningTrips+WeekdayAfternoonTrips+WeekendMorningTrips  +WeekendEveningTrips) %>% 
  rename (n_bus = n_colli)

Trip amount per bus stop, to determine whether the quantity of bus stations or whether it is indeed crowded.

TripsPerBusStop <- Trips %>%
  mutate (WeekdayMorningTrips = WeekdayMorningTrips/n_bus,
          WeekdayAfternoonTrips = WeekdayAfternoonTrips/n_bus,
          WeekendMorningTrips = WeekendMorningTrips/n_bus,
          WeekendEveningTrips = WeekendEveningTrips/n_bus)
TripsLog <- Trips %>%
  mutate (WeekdayMorningTrips = log(WeekdayMorningTrips),
          WeekdayAfternoonTrips = log(WeekdayAfternoonTrips),
          WeekendMorningTrips = log(WeekendMorningTrips),
          WeekendEveningTrips = log(WeekendEveningTrips))

Visualising

tmap_mode("view")

The map below displays the total bus trip for each bus station of origin.

tm_shape(Trips) +
  tm_fill(
    col = "Total_Trips",
    palette = "Blues",
    style = "quantile",
    title = "Total Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6,
    popup.format = list(
      Total_Trips = list(format = "f", digits = 0)
    )
  ) +
  tm_borders(col = "grey40", lwd = 0.7)

It is evident from the above plot of total trips that bus travel is dispersed throughout the nation. On the other hand, many clusters are visible.

Total Trips

weekday_morning <- tm_shape(Trips) +
  tm_fill(
    col = "WeekdayMorningTrips",
    palette = "Blues",
    style = "quantile",
    title = "Weekday Morning Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6
  ) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_legend(scale = 0.5)
weekday_afternoon <- tm_shape(Trips) +
  tm_fill(
    col = "WeekdayAfternoonTrips",
    palette = "Blues",
    style = "quantile",
    title = "Weekday Afternoon Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6,
  ) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_legend(scale = 0.5)
weekend_morning <- tm_shape(Trips) +
  tm_fill(
    col = "WeekendMorningTrips",
    palette = "Blues",
    style = "quantile",
    title = "Weekend Morning Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6
  ) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_legend(scale = 0.5)
weekend_evening <- tm_shape(Trips) +
  tm_fill(
    col = "WeekendEveningTrips",
    palette = "Blues",
    style = "quantile",
    title = "Weekend Evening Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6
  ) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_legend(scale = 0.5)
tmap_mode("plot")
tmap_arrange(weekday_morning, weekday_afternoon, weekend_morning, weekend_evening, ncol=2, nrow=2)

The segmented trips in the map above are broken down into four sector: weekend morning, weekend afternoon, and weekend morning. The distinctions between them are negligible. On the east side, there is a noticeable difference, though. On weekends and weekdays, the east side quantile is lower in the evening or afternoon than it is in the morning. Additionally, it is evident in the central that in the afternoon and evening, their quantile is higher.

Total Trips per Bus Stop

weekday_morning_per_stop <- tm_shape(TripsPerBusStop) +
  tm_fill(
    col = "WeekdayMorningTrips",
    palette = "Blues",
    style = "quantile",
    title = "Weekday Morning Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6
  ) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_legend(scale = 0.5)
weekday_afternoon_per_stop <- tm_shape(TripsPerBusStop) +
  tm_fill(
    col = "WeekdayAfternoonTrips",
    palette = "Blues",
    style = "quantile",
    title = "Weekday Afternoon Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6,
  ) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_legend(scale = 0.5)
weekend_morning_per_stop <- tm_shape(TripsPerBusStop) +
  tm_fill(
    col = "WeekendMorningTrips",
    palette = "Blues",
    style = "quantile",
    title = "Weekend Morning Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6
  ) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_legend(scale = 0.5)
weekend_evening_per_stop <- tm_shape(TripsPerBusStop) +
  tm_fill(
    col = "WeekendEveningTrips",
    palette = "Blues",
    style = "quantile",
    title = "Weekend Evening Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6
  ) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_legend(scale = 0.5)
tmap_mode("plot")
tmap_arrange(weekday_morning_per_stop, weekday_afternoon_per_stop,
             weekend_morning_per_stop, weekend_evening_per_stop,
            ncol=2, nrow=2) 

The diagram above demonstrates how the overall number of trips varies depending on the day and time of day. Each bus station in a hexagon has a different amount of total trips within it. Still, the outcomes are strikingly identical to the last one.

Log value of trips

weekday_morning_log <- tm_shape(TripsLog) +
  tm_fill(
    col = "WeekdayMorningTrips",
    palette = "Blues",
    style = "quantile",
    title = "Weekday Morning Trips per Bus Stop",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6
  ) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_legend(scale = 0.5)
weekday_afternoon_log <- tm_shape(TripsLog) +
  tm_fill(
    col = "WeekdayAfternoonTrips",
    palette = "Blues",
    style = "quantile",
    title = "Weekday Afternoon Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6,
  ) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_legend(scale = 0.5)
weekend_morning_log <- tm_shape(TripsLog) +
  tm_fill(
    col = "WeekendMorningTrips",
    palette = "Blues",
    style = "quantile",
    title = "Weekend Morning Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6
  ) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_legend(scale = 0.5)
weekend_evening_log <- tm_shape(TripsLog) +
  tm_fill(
    col = "WeekendEveningTrips",
    palette = "Blues",
    style = "quantile",
    title = "Weekend Evening Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.6
  ) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_legend(scale = 0.5)
tmap_mode("plot")
tmap_arrange(weekday_morning_log, weekday_afternoon_log,
             weekend_morning_log, weekend_evening_log, ncol=2, nrow=2)

When the number is logarithmic, the value of the total number of trips for each hexagon is displayed in the figure above. Additionally, there aren’t many differences between them.

Local Indicators of Spatial Association (LISA) Analysis

Spatial Weight

wm_idw <- Trips %>%
  mutate(nb = st_dist_band(grid),
         wts = st_inverse_distance(nb, grid,
                                   scale = 1,
                                   alpha = 1),
         .before = 1) %>%
  mutate(grid_id = 1:length(Trips$grid_id))

Local Moran

Weekday Morning

lisa_WDM <- wm_idw %>% 
  mutate(local_moran = local_moran(
    WeekdayMorningTrips, nb, wts, nsim = 99, na.action=na.pass),
         .before = 1) %>%
  unnest(local_moran)
ii_val_moran_WDM <- tm_shape(lisa_WDM) +
  tm_fill("ii") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "local Moran's I of Trip",
            main.title.size = 0.8)
p_val_moran_WDM <- tm_shape(lisa_WDM) +
  tm_fill("p_ii_sim") + 
  tm_borders(alpha = 0.5) +
   tm_layout(main.title = "p-value of local Moran's I",
            main.title.size = 0.8)
tmap_arrange(ii_val_moran_WDM, p_val_moran_WDM, ncol = 2)

lisa_sig_WDM <- lisa_WDM  %>%
  filter(p_ii_sim < 0.05)
pal <- RColorBrewer::brewer.pal("Set1", n = 5)

weekday_morning_localmoran <- tm_shape(lisa_WDM) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
tm_shape(lisa_sig_WDM) +
  tm_fill("mean", palette = pal , title = "Weekday Morning Trips") + 
  tm_borders(alpha = 0.4)

Weekday Afternoon

lisa_WDA <- wm_idw %>% 
  mutate(local_moran = local_moran(
    WeekdayAfternoonTrips, nb, wts, nsim = 99, na.action=na.pass),
         .before = 1) %>%
  unnest(local_moran)
ii_val_moran_WDA <- tm_shape(lisa_WDA) +
  tm_fill("ii") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "local Moran's I of Trip",
            main.title.size = 0.8)
p_val_moran_WDA <- tm_shape(lisa_WDA) +
  tm_fill("p_ii_sim") + 
  tm_borders(alpha = 0.5) +
   tm_layout(main.title = "p-value of local Moran's I",
            main.title.size = 0.8)
tmap_arrange(ii_val_moran_WDA, p_val_moran_WDA, ncol = 2)

lisa_sig_WDA <- lisa_WDA  %>%
  filter(p_ii_sim < 0.05)
pal <- RColorBrewer::brewer.pal("Set1", n = 5)

weekday_afternoon_localmoran <- tm_shape(lisa_sig_WDA) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
tm_shape(lisa_sig_WDA) +
  tm_fill("mean", palette = pal , title = "Weekday Afternoon Trips") + 
  tm_borders(alpha = 0.4)

Weekend Morning

lisa_WEM <- wm_idw %>% 
  mutate(local_moran = local_moran(
    WeekendMorningTrips, nb, wts, nsim = 99, na.action=na.pass),
         .before = 1) %>%
  unnest(local_moran)
ii_val_moran_WEM <- tm_shape(lisa_WEM) +
  tm_fill("ii") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "local Moran's I of Trip",
            main.title.size = 0.8)
p_val_moran_WEM <- tm_shape(lisa_WEM) +
  tm_fill("p_ii_sim") + 
  tm_borders(alpha = 0.5) +
   tm_layout(main.title = "p-value of local Moran's I",
            main.title.size = 0.8)
tmap_arrange(ii_val_moran_WEM, p_val_moran_WEM, ncol = 2)

lisa_sig_WEM <- lisa_WEM  %>%
  filter(p_ii_sim < 0.05)
pal <- RColorBrewer::brewer.pal("Set1", n = 5)

weekend_morning_localmoran <- tm_shape(lisa_WEM) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
tm_shape(lisa_sig_WEM) +
  tm_fill("mean", palette = pal , title = "Weekend Morning Trips") + 
  tm_borders(alpha = 0.4)

Weekend Evening

lisa_WEE <- wm_idw %>% 
  mutate(local_moran = local_moran(
    WeekendEveningTrips, nb, wts, nsim = 99, na.action=na.pass),
         .before = 1) %>%
  unnest(local_moran)
ii_val_moran_WEE <- tm_shape(lisa_WEE) +
  tm_fill("ii") + 
  tm_borders(alpha = 0.5) +
  tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "local Moran's I of Trip",
            main.title.size = 0.8)
p_val_moran_WEE <- tm_shape(lisa_WEE) +
  tm_fill("p_ii_sim") + 
  tm_borders(alpha = 0.5) +
   tm_layout(main.title = "p-value of local Moran's I",
            main.title.size = 0.8)
tmap_arrange(ii_val_moran_WEE, p_val_moran_WEE, ncol = 2)

lisa_sig_WEE <- lisa_WEE  %>%
  filter(p_ii_sim < 0.05)
pal <- RColorBrewer::brewer.pal("Set1", n = 5)

weekend_evening_localmoran <- tm_shape(lisa_WEE) +
  tm_polygons() +
  tm_borders(alpha = 0.5) +
tm_shape(lisa_sig_WEE) +
  tm_fill("mean", palette = pal , title = "Weekend Evening Trips") + 
  tm_borders(alpha = 0.4)

LISA Map

Summary all the 4 category, below is the 4 LISA Map in 4 category (weekday_morning, weekday_afternoon, weekend_morning, weekend_evening) using tmap_arrange.

tmap_arrange(weekday_morning_localmoran, weekday_afternoon_localmoran,
             weekend_morning_localmoran, weekend_evening_localmoran,
             ncol = 2, nrow = 2)

Conclusion

Using the inverse distance weight, the map above displays the local Moran relationship between each hexagonal grid. The 2 top row is weekday morning and weekday afternoon. The 2 in bottom row is weekend morning and weekend afternoon.

From the graph above compare the different between weekday and weekend. On weekday most of the clustering is Low-High follow by High-Low, there is only on Low-Low on weekday morning. For Weekend, there are more Low-Low and High-High compare to weekday. There is more mix cluster on weekend.

The fact that individuals left for work or school in the morning and returned home in the afternoon lends credence to both of the weekday graphs. As it can observed from the map, on weekday mornings the spread occurs more in the suburbs of Singapore, however in the afternoon it is closer to the city core.The weekend graphs show that throughout the morning, the most of the dispersion and clustering occurred in the central region, and during the night, it was slightly dispersed.